467 research outputs found

    The Simplest Analysis Method for Non-stationary Sinusoidal Modeling

    No full text
    International audienceThis paper introduces an analysis method based on the generalization of the phase vocoder approach to non-stationary sinusoidal modeling. This new method is then compared to the reassignment method for the estimation of all the parameters of the model (phase, amplitude, frequency, amplitude modulation, and frequency modulation), and to the Cramér-Rao bounds. It turns out that this method compares to the state of the art in terms of performances, with the great advantage of being much simpler

    IBISA: Making Image-Based Identification of Ancient Coins Robust to Lighting Conditions

    No full text
    International audienceThe IBISA (Image-Based Identification/Search for Archaeology) system manages databases of digital images of archaeological objects, e.g. ancient coins, and allows the user to perform searches by examples. IBISA was designed to help the user decide, from their images, if two objects (coins) are either the same, come from the same matrix (die), share resemblance in style, or are completely different. The system searches for similarities in the databases using a registration method that must be resilient to the viewing conditions. Based on the Fourier transform, it cancels rigid transforms among images. Sub-pixel accuracy can be achieved with a very simple technique. However lighting conditions remain an issue. Fortunately, it is possible to extend this registration method to a light-independent model, considering the elevation or normal maps instead of intensity. The model is also useful for interactive visualization and museography. Although this model registration is now resilient to all viewing conditions, it is not practical in real scenarios where the target is a single image, from which a model can hardly be derived. Finally, a hybrid approach is investigated, with a target image but a model of the reference. It is more realistic, resilient to light conditions, gives excellent results with translations, but shows limitations for rotations

    Fourier-based Methods for the Spectral Analysis of Musical Sounds

    No full text
    International audienceWhen dealing with musical sounds, the short-time Fourier transform prevails and sinusoids play a key role, according to both acoustics (vibrating modes) and psychoacoustics (pure tones). The values obtained when decomposing the signal on the time-frequency atoms are usually assigned to their geometrical center, leading to estimation errors for the sinusoidal parameters. To correct this, one can exploit the amplitude or phase information, use the derivatives of the analysis window, or those of the audio signal. This leads to three methods (phase vocoder, spectral reassignment, derivative algorithm) equally efficient: they are in fact different formulations of the best analysis method based on the Fourier spectrum

    Informed Separation of Spatial Images of Stereo Music Recordings Using Second-Order Statistics

    No full text
    International audienceIn this work we address a reverse audio engineering problem, i.e. the separation of stereo tracks of professionally produced music recordings. More precisely, we apply a spatial filtering approach with a quadratic constraint using an explicit source-image-mixture model. The model parameters are "learned" from a given set of original stereo tracks, reduced in size and used afterwards to demix the desired tracks in best possible quality from a preexisting mixture. Our approach implicates a side-information rate of 10 kbps per source or channel and has a low computational complexity. The results obtained for the SiSEC 2013 dataset are intended to be used as reference for comparison with unpublished approaches

    On the Informed Source Separation Approach for Interactive Remixing in Stereo

    No full text
    International audienceInformed source separation (ISS) has become a popular trend in the audio signal processing community over the past few years. Its purpose is to decompose a mixture signal into its constituent parts at the desired or the best possible quality level given some metadata. In this paper we present a comparison between two ISS systems and relate the ISS approach in various configurations with conventional coding of separate tracks for interactive remixing in stereo. The compared systems are Underdetermined Source Signal Recovery (USSR) and Enhanced Audio Object Separation (EAOS). The latter forms a part of MPEG's Spatial Audio Object Coding technology. The performance is evaluated using objective difference grades computed with PEMO-Q. The results suggest that USSR performs perceptually better than EOAS and has a lower computational complexity

    Informed Multiple-F0 Estimation Applied to Monaural Audio Source Separation

    No full text
    International audienceThis paper proposes a new informed source separation technique which combines music transcription with source separation. The presented system is based on a coder / decoder configuration where a classic (not informed) multiple-F0 estimation is applied on each separated source signal assumed known at the coder before the mixing process. Thus, the extra information required to recover the reference transcription of each isolated instrument is computed and inaudibly embedded into the mixture using a watermarking technique. At the decoder, where the original source signals are unknown, instruments are separated from the mixture using the informed transcription of each source signal. In this paper, we show that a classic (non-informed) F0 estimator can be used to reduce the amount of bits necessary to transmit the exact transcription of each isolated instrument

    Synthetic Transaural Audio Rendering (STAR): a Perceptive Approach for Sound Spatialization

    Get PDF
    International audienceThe principles of Synthetic Transaural Audio Rendering (STAR) were first introduced at DAFx-06. This is a perceptive approach for sound spatialization, whereas state-of-the-art methods are rather physical. With our STAR method, we focus neither on the wave field (such as HOA) nor on the sound wave (such as VBAP), but rather on the acoustic paths traveled by the sound to the listener ears. The STAR method consists in canceling the cross-talk signals between two loudspeakers and the ears of the listener (in a transaural way), with acoustic paths not measured but computed by some model (thus synthetic). Our model is based on perceptive cues, used by the human auditory system for sound localization. The aim is to give the listener the sensation of the position of each source, and not to reconstruct the corresponding acoustic wave or field. This should work with various loudspeaker configurations, with a large sweet spot, since the model is neither specialized for a specific configuration nor individualized for a specific listener. Experimental tests have been conducted in 2015 and 2019 with different rooms and audiences, for still, moving, and polyphonic musical sounds. It turns out that the proposed method is competitive with the state-of-the-art ones. However, this is a work in progress and further work is needed to improve the quality

    Coupled oxidation–reduction of butanol–hexanal by resting Rhodococcus erythropolis NCIMB 13064 cells in liquid and gas phases

    Get PDF
    Rhodococcus erythropolis is a promising Gram-positive bacterium capable of numerous bioconversions including those involving alcohol dehydrogenases (ADHs). In this work, we compared and optimized the redox biocatalytic performances of 1-butanol-grown R. erythropolis NCIMB 13064 cells in aqueous and in non-conventional gas phase using the 1-butanol–hexanal oxidation–reduction as model reaction. Oxidation of 1-butanol to butanal is tightly coupled to the reduction of hexanal to 1-hexanol at the level of a nicotinoprotein–ADH-like enzyme. Cell viability is dispensable for reaction. In aqueous batch conditions, fresh and lyophilized cells are efficient redox catalysts (oxidation–reduction rate = 76 micromol min−1 g cell dry mass−1) being also reactive towards benzyl alcohol, (S)-2-pentanol, and geraniol as reductants. However, butanol hexanal oxidation–reduction is strongly limited by product accumulation and by hexanal toxicity that is amajor factor influencing cell behavior and performance. Reaction rate is maximal at 40 ◩C pH 7.0 in aqueous phase and at 60 ◩C- pH 7.0–9.0 in gas phase. Importantly, lyophilized cells also showed to be promising redox catalysts in the gas phase (at least 65 micromol min−1 g cell dry mass−1). The system is notably stable for several days at moderate thermodynamic activities of hexanal (0.06–0.12), 1-butanol (0.12) and water (0.7)

    First-order ambisonic coding with quaternion-based interpolation of PCA rotation matrices

    Get PDF
    International audienceConversational applications such as telephony are mostly restricted to mono. With the emergence of VR/XR applications and new products with spatial audio, there is a need to extend traditional voice and audio codecs to enable immersive communication.The present work is motivated by recent activities in 3GPP standardization around the development of a new codec called immersive voice and audio services (IVAS). The IVAS codec will address a wide variety of use cases, e.g. immersive telephony, spatial audio conferencing, live content sharing. There are two main design goals for IVAS. One goal is the versatility of the codec in terms of input (scene-based, channel-based, object-based audio
) and output (mono, stereo, binaural, various multichannel loudspeaker setups). The second goal is to re-use as much as possible and extend the enhanced voice services (EVS) mono codec.In this work, we focus on the first-order ambisonic (FOA) format which is a good candidate for the internal representation in an immersive audio codec at low bit rates, due to the flexibility of the underlying sound field decomposition. We propose a new coding method, which can extend existing core codecs such as EVS. The proposed method consists in adaptively pre-processing ambisonic components prior to multi-mono coding by a core codec.The first part of this work investigates the basic multi-mono coding approach for FOA, which is for instance used in the Opus codec (in the so-called channel mapping family 2). In this approach ambisonic components are coded separately with different instances of the (mono) core codec. We present results of a subjective test (MUSHRA), which shows that this direct approach is not satisfactory for low-bitrate coding. The signal structure is degraded which produces many spatial artifacts (e.g. wrong panning, ghost sources...). In the second part of this work, we propose a new method to exploit the correlation of ambisonic components. The pre-processing (prior to multi-mono coding) operates in time-domain to allow maximum compatibility with many codecs, especially low bit-rate codecs such as EVS and Opus, and to minimize extra delay.The proposed method applies Principal Components Analysis (PCA) on a 20 ms frame basis. For each frame, eigenvectors are computed and the eigenvector matrix is defined as a 4D rotation matrix. For complex sound scenes (with many audio sources, sudden changes
) rotation parameters may change dramatically between consecutive frames and audio sources may go from one principal component to another, which may cause discontinuities or other artifacts. Solutions such as the interpolation of eigenvectors (after inter-frame realignment) are not optimal. In the proposed method, we ensure smooth transitions between inter-frame PCA rotations thanks to two complementary methods. The first one is a matching algorithm for eigenvectors between the current and the previous frame, which avoids signal inversion and permutation across frames. The second one is an interpolation of the 4D rotation matrices in quaternion domain. We use the Cayley factorization of 4D rotation matrices into a double quaternion for the current and previous frame and apply quaternion spherical linear interpolation (QSLERP) interpolation on a subframe basis. The interpolated rotation matrices are then applied to the ambisonic components and the decorrelated components are coded with a multi-mono coding approach.We present results of a subjective evaluation (MUSHRA) for the proposed method showing that the proposed method brings significant improvements over naive multi-mono method, especially in terms of spatial quality
    • 

    corecore